78 research outputs found
A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology
The widespread availability of high-dimensional biological data has made the
simultaneous screening of numerous biological characteristics a central
statistical problem in computational biology. While the dimensionality of such
datasets continues to increase, the problem of teasing out the effects of
biomarkers in studies measuring baseline confounders while avoiding model
misspecification remains only partially addressed. Efficient estimators
constructed from data adaptive estimates of the data-generating distribution
provide an avenue for avoiding model misspecification; however, in the context
of high-dimensional problems requiring simultaneous estimation of numerous
parameters, standard variance estimators have proven unstable, resulting in
unreliable Type-I error control under standard multiple testing corrections. We
present the formulation of a general approach for applying empirical Bayes
shrinkage approaches to asymptotically linear estimators of parameters defined
in the nonparametric model. The proposal applies existing shrinkage estimators
to the estimated variance of the influence function, allowing for increased
inferential stability in high-dimensional settings. A methodology for
nonparametric variable importance analysis for use with high-dimensional
biological datasets with modest sample sizes is introduced and the proposed
technique is demonstrated to be robust in small samples even when relying on
data adaptive estimators that eschew parametric forms. Use of the proposed
variance moderation strategy in constructing stabilized variable importance
measures of biomarkers is demonstrated by application to an observational study
of occupational exposure. The result is a data adaptive approach for robustly
uncovering stable associations in high-dimensional data with limited sample
sizes
Revisiting the propensity score's central role: Towards bridging balance and efficiency in the era of causal machine learning
About forty years ago, in a now--seminal contribution, Rosenbaum & Rubin
(1983) introduced a critical characterization of the propensity score as a
central quantity for drawing causal inferences in observational study settings.
In the decades since, much progress has been made across several research
fronts in causal inference, notably including the re-weighting and matching
paradigms. Focusing on the former and specifically on its intersection with
machine learning and semiparametric efficiency theory, we re-examine the role
of the propensity score in modern methodological developments. As Rosenbaum &
Rubin (1983)'s contribution spurred a focus on the balancing property of the
propensity score, we re-examine the degree to which and how this property plays
a role in the development of asymptotically efficient estimators of causal
effects; moreover, we discuss a connection between the balancing property and
efficient estimation in the form of score equations and propose a score test
for evaluating whether an estimator achieves balance.Comment: Accepted for publication in a forthcoming special issue of
Observational Studie
A nonparametric framework for treatment effect modifier discovery in high dimensions
Heterogeneous treatment effects are driven by treatment effect modifiers,
pre-treatment covariates that modify the effect of a treatment on an outcome.
Current approaches for uncovering these variables are limited to
low-dimensional data, data with weakly correlated covariates, or data generated
according to parametric processes. We resolve these issues by developing a
framework for defining model-agnostic treatment effect modifier variable
importance parameters applicable to high-dimensional data with arbitrary
correlation structure, deriving one-step, estimating equation and targeted
maximum likelihood estimators of these parameters, and establishing these
estimators' asymptotic properties. This framework is showcased by defining
variable importance parameters for data-generating processes with continuous,
binary, and time-to-event outcomes with binary treatments, and deriving
accompanying multiply-robust and asymptotically linear estimators. Simulation
experiments demonstrate that these estimators' asymptotic guarantees are
approximately achieved in realistic sample sizes for observational and
randomized studies alike. This framework is applied to gene expression data
collected for a clinical trial assessing the effect of a monoclonal antibody
therapy on disease-free survival in breast cancer patients. Genes predicted to
have the greatest potential for treatment effect modification have previously
been linked to breast cancer. An open-source R package implementing this
methodology, unihtee, is made available on GitHub at
https://github.com/insightsengineering/unihtee
Disability-adjusted life-years (DALYs) for 315 diseases and injuries and healthy life expectancy (HALE) in Iran and its neighboring countries, 1990–2015
BACKGROUND: Summary measures of health are essential in making estimates of health status that are comparable across time and place. They can be used for assessing the performance of health systems, informing effective policy making, and monitoring the progress of nations toward achievement of sustainable development goals. The Global Burden of Diseases, Injuries, and Risk Factors Study 2015 (GBD 2015) provides disability-adjusted life-years (DALYs) and healthy life expectancy (HALE) as main summary measures of health. We assessed the trends of health status in Iran and 15 neighboring countries using these summary measures. METHODS: We used the results of GBD 2015 to present the levels and trends of DALYs, life expectancy (LE), and HALE in Iran and its 15 neighboring countries from 1990 to 2015. For each country, we assessed the ratio of observed levels of DALYs and HALE to those expected based on socio-demographic index (SDI), an indicator composed of measures of total fertility rate, income per capita, and average years of schooling. RESULTS: All-age numbers of DALYs reached over 19 million years in Iran in 2015. The all-age number of DALYs has remained stable during the past two decades in Iran, despite the decreasing trends in all-age and age-standardized rates. The all-cause DALY rates decreased from 47,200 in 1990 to 28,400 per 100,000 in 2015. The share of non-communicable diseases in DALYs increased in Iran (from 42% to 74%) and all of its neighbors between 1990 and 2015; the pattern of change is similar in almost all 16 countries. The DALY rates for NCDs and injuries in Iran were higher than global rates and the average rate in High Middle SDI countries, while those for communicable, maternal, neonatal, and nutritional disorders were much lower in Iran. Among men, cardiovascular diseases ranked first in all countries of the region except for Bahrain. Among women, they ranked first in 13 countries. Life expectancy and HALE show a consistent increase in all countries. Still, there are dissimilarities indicating a generally low LE and HALE in Afghanistan and Pakistan and high expectancy in Qatar, Kuwait, and Saudi Arabia. Iran ranked 11th in terms of LE at birth and 12th in terms of HALE at birth in 1990 which improved to 9th for both metrics in 2015. Turkey and Iran had the highest increase in LE and HALE from 1990 to 2015 while the lowest increase was observed in Armenia, Pakistan, Kuwait, Kazakhstan, Russia, and Iraq. CONCLUSIONS: The levels and trends in causes of DALYs, life expectancy, and HALE generally show similarities between the 16 countries, although differences exist. The differences observed between countries can be attributed to a myriad of determinants, including social, cultural, ethnic, religious, political, economic, and environmental factors as well as the performance of the health system. Investigating the differences between countries can inform more effective health policy and resource allocation. Concerted efforts at national and regional levels are required to tackle the emerging burden of non-communicable diseases and injuries in Iran and its neighbors
- …